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Preface 

Facing the unusual popularity of wavelets in sciences, I began to 
wonder whether this was just another fashion that would fade away 
with time. After several years of research and teaching on this topic, 
and surviving the painful experience of writing a book, you may rightly 
expect that I have calmed my anguish. This might be the natural self- 
delusion affecting any researcher studying his corner of the world, but 
there might be more. 

Wavelets are not based on a “bright new idea”, but on concepts 
that already existed under various forms in many different fields. The 
formalization and emergence of this “wavelet theory” is the result of a 
multidisciplinary effort that brought together mathematicians, physi- 
cists and engineers, who recognized that they were independently devel- 
oping similar ideas. For signal processing, this connection has created 
a flow of ideas that goes well beyond the construction of new bases or 
transforms. 

A Personal Experience At one point, you cannot avoid mention- 
ing who did what. For wavelets, this is a particularly sensitive task, 
risking aggressive replies from forgotten scientific tribes arguing that 
such and such results originally belong to them. As I said, this wavelet 
theory is truly the result of a dialogue between scientists who often met 
by chance, and were ready to listen. From my totally subjective point 
of view, among the many researchers who made important contribu- 
tions, I would like to single out one, Yves Meyer, whose deep scientific- 
vision was a major ingredient sparking this catalysis. It is ironic to 
see a French pure mathematician, raised in a Bourbakist culture where 
applied meant trivial, playing a central role along this wavelet bridge 
between engineers and scientists coming from different disciplines. 

When beginning my Ph.D. in the U.S., the only project I had in 
mind was to travel, never become a researcher, and certainly never 
teach. I had clearly destined myself to come back to France, and quickly 
begin climbing the ladder of some big corporation. Ten years later, I 
was still in the U.S., the mind buried in the hole of some obscure 
scientific problem, while teaching in a university. So what went wrong? 
Probably the fact that I met scientists like Yves Meyer, whose ethic 




CONTENTS 



11 



and creativity have given me a totally different view of research and 
teaching. Trying to communicate this flame was a central motivation 
for writing this book. I hope that you will excuse me if my prose ends 
up too often in the no man’s land of scientific neutrality. 

A Few Ideas Beyond mathematics and algorithms, the book carries 
a few important ideas that I would like to emphasize. 

• Time-frequency wedding Important information often appears 
through a simultaneous analysis of the signal’s time and fre- 
quency properties. This motivates decompositions over elemen- 
tary “atoms” that are well concentrated in time and frequency. It 
is therefore necessary to understand how the uncertainty principle 
limits the flexibility of time and frequency transforms. 

• Scale for zooming Wavelets are scaled waveforms that measure 
signal variations. By traveling through scales, zooming proce- 
dures provide powerful characterizations of signal structures such 
as singularities. 

• More and more bases Many orthonormal bases can be designed 
with fast computational algorithms. The discovery of filter banks 
and wavelet bases has created a popular new sport of basis hunt- 
ing. Families of orthogonal bases are created every day. This 
game may however become tedious if not motivated by applica- 
tions. 

• Sparse representations An orthonormal basis is useful if it de- 
fines a representation where signals are well approximated with 
a few non-zero coefficients. Applications to signal estimation in 
noise and image compression are closely related to approximation 
theory. 

• Try it non-linear and diagonal Linearity has long predominated 
because of its apparent simplicity. We are used to slogans that 
often hide the limitations of “optimal” linear procedures such as 
Wiener filtering or Karhunen-Loeve bases expansions. In sparse 
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representations, simple non-linear diagonal operators can con- 
siderably outperform “optimal” linear procedures, and fast al- 
gorithms are available. 

WaveLab and LastWave Toolboxes Numerical experimentations 
are necessary to fully understand the algorithms and theorems in this 
book. To avoid the painful programming of standard procedures, nearly 
all wavelet and time-frequency algorithms are available in the Wave- 
Lab package, programmed in Matlab. WaveLab is a freeware soft- 
ware that can be retrieved through the Internet. The correspondence 
between algorithms and WaveLab subroutines is explained in Ap- 
pendix B. All computational figures can be reproduced as demos in 
WaveLab. LastWave is a wavelet signal and image processing envi- 
ronment, written in C for Xll/Unix and Macintosh computers. This 
stand-alone freeware does not require any additional commercial pack- 
age. It is also described in Appendix B. 



Teaching This book is intended as a graduate textbook. It took 
form after teaching “wavelet signal processing” courses in electrical en- 
gineering departments at MIT and Tel Aviv University, and in applied 
mathematics departments at the Courant Institute and Ecole Polytech- 
nique (Paris). 

In electrical engineering, students are often initially frightened by 
the use of vector space formalism as opposed to simple linear algebra. 
The predominance of linear time invariant systems has led many to 
think that convolutions and the Fourier transform are mathematically 
sufficient to handle all applications. Sadly enough, this is not the case. 
The mathematics used in the book are not motivated by theoretical 
beauty; they are truly necessary to face the complexity of transient 
signal processing. Discovering the use of higher level mathematics hap- 
pens to be an important pedagogical side-effect of this course. Numeri- 
cal algorithms and figures escort most theorems. The use of WaveLab 
makes it particularly easy to include numerical simulations in home- 
work. Exercises and deeper problems for class projects are listed at the 
end of each chapter. 

In applied mathematics, this course is an introduction to wavelets 
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but also to signal processing. Signal processing is a newcomer on the 
stage of legitimate applied mathematics topics. Yet, it is spectacularly 
well adapted to illustrate the applied mathematics chain, from problem 
modeling to efficient calculations of solutions and theorem proving. Im- 
ages and sounds give a sensual contact with theorems, that can wake up 
most students. For teaching, formatted overhead transparencies with 
enlarged figures are available on the Internet: 

http: //www. cmap .polytechnique . f r/ ^mallat/Wavetour_f igures/ . 

Francois Chaplais also offers an introductory Web tour of basic concepts 
in the book at 

http : / / cas . ensmp . f r/^chaplais/Wavetour_presentation/. 

Not all theorems of the book are proved in detail, but the important 
techniques are included. I hope that the reader will excuse the lack 
of mathematical rigor in the many instances where I have privileged 
ideas over details. Few proofs are long: they are concentrated to avoid 
diluting the mathematics into many intermediate results, which would 
obscure the text. 

Course Design Level numbers explained in Section 1.5.2 can help in 
designing an introductory or a more advanced course. Beginning with 
a review of the Fourier transform is often necessary. Although most 
applied mathematics students have already seen the Fourier transform, 
they have rarely had the time to understand it well. A non-technical re- 
view can stress applications, including the sampling theorem. Refresh- 
ing basic mathematical results is also needed for electrical engineering 
students. A mathematically oriented review of time-invariant signal 
processing in Chapters 2 and 3 is the occasion to remind the student of 
elementary properties of linear operators, projectors and vector spaces, 
which can be found in Appendix A. For a course of a single semester, 
one can follow several paths, oriented by different themes. Here are few 
possibilities. 

One can teach a course that surveys the key ideas previously out- 
lined. Chapter 4 is particularly important in introducing the concept of 
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local time-frequency decompositions. Section 4.4 on instantaneous fre- 
quencies illustrates the limitations of time-frequency resolution. Chap- 
ter 6 gives a different perspective on the wavelet transform, by relating 
the local regularity of a signal to the decay of its wavelet coefficients 
across scales. It is useful to stress the importance of the wavelet vanish- 
ing moments. The course can continue with the presentation of wavelet 
bases in Chapter 7, and concentrate on Sections 7. 1-7.3 on orthogonal 
bases, multiresolution approximations and filter bank algorithms in one 
dimension. Linear and non-linear approximations in wavelet bases are 
covered in Chapter 9. Depending upon students’ backgrounds and in- 
terests, the course can finish in Chapter 10 with an application to signal 
estimation with wavelet thresholding, or in Chapter 11 by presenting 
image transform codes in wavelet bases. 

A different course may study the construction of new orthogonal 
bases and their applications. Beginning with the wavelet basis, Chap- 
ter 7 also gives an introduction to filter banks. Continuing with Chapter 
8 on wavelet packet and local cosine bases introduces different orthog- 
onal tilings of the time-frequency plane. It explains the main ideas 
of time-frequency decompositions. Chapter 9 on linear and non-linear 
approximation is then particularly important for understanding how to 
measure the efficiency of these bases, and for studying best bases search 
procedures. To illustrate the differences between linear and non-linear 
approximation procedures, one can compare the linear and non-linear 
thresholding estimations studied in Chapter 10. 

The course can also concentrate on the construction of sparse repre- 
sentations with orthonormal bases, and study applications of non-linear 
diagonal operators in these bases. It may start in Chapter 10 with a 
comparison of linear and non-linear operators used to estimate piece- 
wise regular signals contaminated by a white noise. A quick excursion 
in Chapter 9 introduces linear and non-linear approximations to ex- 
plain what is a sparse representation. Wavelet orthonormal bases are 
then presented in Chapter 7, with special emphasis on their non-linear 
approximation properties for piecewise regular signals. An application 
of non-linear diagonal operators to image compression or to threshold- 
ing estimation should then be studied in some detail, to motivate the 
use of modern mathematics for understanding these problems. 

A more advanced course can emphasize non-linear and adaptive sig- 
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nal processing. Chapter 5 on frames introduces flexible tools that are 
useful in analyzing the properties of non-linear representations such 
as irregularly sampled transforms. The dyadic wavelet maxima repre- 
sentation illustrates the frame theory, with applications to multiscale 
edge detection. To study applications of adaptive representations with 
orthonormal bases, one might start with non-linear and adaptive ap- 
proximations, introduced in Chapter 9. Best bases, basis pursuit or 
matching pursuit algorithms are examples of adaptive transforms that 
construct sparse representations for complex signals. A central issue is 
to understand to what extent adaptivity improves applications such as 
noise removal or signal compression, depending on the signal properties. 



Responsibilities This book was a one-year project that ended up in 
a never will finish nightmare. Ruzena Bajcsy bears a major responsibil- 
ity for not encouraging me to choose another profession, while guiding 
my first research steps. Her profound scientific intuition opened my 
eyes to and well beyond computer vision. Then of course, are all the 
collaborators who could have done a much better job of showing me 
that science is a selfish world where only competition counts. The 
wavelet story was initiated by remarkable scientists like Alex Gross- 
maim. whose modesty created a warm atmosphere of collaboration, 
where strange new ideas and ingenuity were welcome as elements of 
creativity. 

I am also grateful to the few people who have been willing to work 
with me. Some have less merit because they had to finish their de- 
gree but others did it on a voluntary basis. I am thinking of Amir 
Averbuch, Emmanuel Bacry, Francois Bergeaud, Geoff Davis, Davi 
Geiger, Frederic Falzon, Wen Liang Hwang, Hamid Krim, George Pa- 
panicolaou, Jean- Jacques Slotine, Alan Willsky, Zifeng Zhang and Sifen 
Zhong. Their patience will certainly be rewarded in a future life. 

Although the reproduction of these 600 pages will probably not kill 
many trees, I do not want to bear the responsibility alone. After four 
years writing and rewriting each chapter, I first saw the end of the 
tunnel during a working retreat at the Fondation des Trebles, which 
offers an exceptional environment to think, write and eat in Provence. 
With WaveLab, David Donoho saved me from spending the second 
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half of my life programming wavelet algorithms. This opportunity was 
beautifully implemented by Maureen Clerc and Jerome Ivalifa, who 
made all the figures and found many more mistakes than I dare say. 
Dear reader, you should thank Barbara Burke Hubbard, who corrected 
my Frenglisli (remaining errors are mine), and forced me to modify 
many notations and explanations. I thank her for doing it with tact 
and humor. My editor, Chuck Glaser, had the patience to wait but I 
appreciate even more his wisdom to let me think that I would finish in 
a year. 

She will not read this book, yet my deepest gratitude goes to Branka 
with whom life has nothing to do with wavelets. 

Stephane Mallat 
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Second Edition 

Before leaving this Wavelet Tour , I naively thought that I should 
take advantage of remarks and suggestions made by readers. This al- 
most got out of hand, and 200 pages ended up being rewritten. Let me 
outline the main components that were not in the first edition. 

• Bayes versus Minimax Classical signal processing is almost en- 
tirely built in a Bayes framework, where signals are viewed as 
realizations of a random vector. For the last two decades, re- 
searchers have tried to model images with random vectors, but in 
vain. It is thus time to wonder whether this is really the best ap- 
proach. Minimax theory opens an easier avenue for evaluating the 
performance of estimation and compression algorithms. It uses 
deterministic models that can be constructed even for complex 
signals such as images. Chapter 10 is rewritten and expanded to 
explain and compare the Bayes and minimax points of view. 

• Bounded Variation Signals Wavelet transforms provide sparse 
representations of piecewise regular signals. The total variation 
norm gives an intuitive and precise mathematical framework in 
which to characterize the piecewise regularity of signals and im- 
ages. In this second edition, the total variation is used to compute 
approximation errors, to evaluate the risk when removing noise 
from images, and to analyze the distortion rate of image trans- 
form codes. 

• Normalized Scale Continuous mathematics give asymptotic re- 
sults when the signal resolution N increases. In this framework, 
the signal support is fixed, say [0,1], and the sampling interval 
iV _1 is progressively reduced. In contrast, digital signal process- 
ing algorithms are often presented by normalizing the sampling 
interval to 1, which means that the support [0, iV] increases with 
N. This new edition explains both points of views, but the figures 
now display signals with a support normalized to [0, 1], in accor- 
dance with the theorems. The scale parameter of the wavelet 
transform is thus smaller than 1. 
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• Video Compression Compressing video sequences is of prime im- 
portance for real time transmission with low-bandwidth channels 
such as the Internet or telephone lines. Motion compensation 
algorithms are presented at the end of Chapter 11. 
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Notation 



if, 9) 
ll/ll 

/ M = 0(^[n]) 
/N = 0(0 M) 
/N ~ #M 

A < +00 

A»R 

z* 

L* r J 

n mod N 



Inner product (A.6). 

Norm (A. 3). 

Order of: there exists K such that f[n] < Kg[n\. 
Small order of: lim n _> +0O = 0. 

Equivalent to: f[n] = 0(g[n]) and g[n] = 0(f[n\). 
A is finite. 

A is much bigger than B. 

Complex conjugate of z G C. 

Largest integer n < x. 

Smallest integer n > x. 

Remainder of the integer division of n modulo N. 



Sets 

N 

Z 

R 

M+ 

C 



Positive integers including 0. 
Integers. 

Real numbers. 

Positive real numbers. 
Complex numbers. 



Signals 

m 

f\ n \ 

8{t) 

8[n] 

l[a,6] 



Continuous time signal. 

Discrete signal. 

Dirac distribution (A. 30). 

Discrete Dirac (3.17). 

Indicator function which is 1 in [a, b] and 0 outside. 



Spaces 

Co 

C p 

C°° 

W S (K) 

L 2 (R) 



Uniformly continuous functions (7.240). 
p times continuously differentiable functions. 
Infinitely differentiable functions. 

Sobolev s times differentiable functions (9.5). 
Finite energy functions J \f{t)\ 2 clt < + 00 . 
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L P (M) Functions such that f \f{t)\ p clt < +oo. 

1 2 ( Z ) Finite energy discrete signals Ylt=-oo l /[ n ]| 2 < +°°- 

1 P ( Z ) Discrete signals such that |/[ n ]| p < +oo. 

C /V Complex signals of size N. 

U © V Direct sum of two vector spaces. 

U © V Tensor product of two vector spaces (A. 19). 

Operators 

Id Identity. 

/'(f) Derivative 

f^ p \t) Derivative d 'j^ of order p . 

V f(M,y ) Gradient vector (6.54). 

/ * g(t) Continuous time convolution (2.2). 

f * g[n] Discrete convolution (3.18). 

/ ® d[ n ] Circular convolution (3.58) 

Transforms 

/M 

m 

Sf(u, s) 

PsficO 
W f(u, s) 

Pwf(u,Z) 

Pvf(u,Z) 

A /(«,£) 

Probability 

X Random variable. 

E{A" } Expected value. 

TL{X) Entropy (11.4). 

UdiX) Differential entropy (11.20). 

Cov(Xi,X 2 ) Covariance (A. 22). 

F[n\ Random vector. 

R F [ k] Autocovariance of a stationary process (A. 26). 



Fourier transform (2.6), (3.24). 

Discrete Fourier transform (3.34). 

Short-time windowed Fourier transform (4.11). 
Spectrogram (4.12). 

Wavelet transform (4.31). 

Scalogram (4.55). 

Wigner-Ville distribution (4.108). 

Ambiguity function (4.24). 
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Introduction to a Transient 
World 



After a few minutes in a restaurant we cease to notice the annoying- 
hubbub of surrounding conversations, but a sudden silence reminds 
us of the presence of neighbors. Our attention is clearly attracted by 
transients and movements as opposed to stationary stimuli, which we 
soon ignore. Concentrating on transients is probably a strategy for 
selecting important information from the overwhelming amount of data 
recorded by our senses. Yet, classical signal processing has devoted 
most of its efforts to the design of time-invariant and space-invariant 
operators, that modify stationary signal properties. This has led to the 
indisputable hegemony of the Fourier transform, but leaves aside many 
information-processing applications. 

The world of transients is considerably larger and more complex 
than the garden of stationary signals. The search for an ideal Fourier- 
like basis that would simplify most signal processing is therefore a hope- 
less quest. Instead, a multitude of different transforms and bases have 
proliferated, among which wavelets are just one example. This book 
gives a guided tour in this jungle of new mathematical and algorithmic 
results, while trying to provide an intuitive sense of orientation. Major 
ideas are outlined in this first chapter. Section 1.5.2 serves as a travel 
guide and introduces the reproducible experiment approach based on 
the WaveLab and LastWave softwares. It also discusses the use of 
level numbers — landmarks that can help the reader keep to the main 
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roads. 



1.1 Fourier Kingdom 



The Fourier transform rules over linear time-invariant signal processing 
because sinusoidal waves e luJt are eigenvectors of linear time-invariant 
operators. A linear time-invariant operator L is entirely specified by 
the eigenvalues h(u ) ): 

VcjgM , Le iut = Ii(uj) e iuJt . (1.1) 



To compute Lf , a signal / is decomposed as a sum of sinusoidal eigen- 
vectors {e 1 *' 



f(t) 



1 

27 r 



f(uj) Q imt cluo 



( 1 . 2 ) 



If / has finite energy, the theory of Fourier integrals presented in 
Chapter 2 proves that the amplitude f(io) of each sinusoidal wave e lu)i 
is the Fourier transform of /: 



/ +oo 

me-^dt. 

-OO 



(1.3) 



Applying the operator L to / in (1.2) and inserting the eigenvector 
expression (1.1) gives 

1 /»+ OO 

L f( f ) = ^l f{uj)h{uj)e luJt du. (1.4) 

The operator L amplifies or attenuates each sinusoidal component e 1UJt 
of / by h(co). It is a frequency filtering of /. 

As long as we are satisfied with linear time-invariant operators, 
the Fourier transform provides simple answers to most questions. Its 
richness makes it suitable for a wide range of applications such as signal 
transmissions or stationary signal processing. 

However, if we are interested in transient phenomena^ — a word pro- 
nounced at a particular time, an apple located in the left corner of an 
image — the Fourier transform becomes a cumbersome tool. 
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The Fourier coefficient is obtained in (1.3) by correlating / with a 
sinusoidal wave e zult . Since the support of covers the whole real 
line, f(uj) depends on the values f(t) for all times t G 1 This global 
“mix” of information makes it difficult to analyze any local property of 
/ from /. Chapter 4 introduces local time-frequency transforms, which 
decompose the signal over waveforms that are well localized in time 
and frequency. 



1.2 Time- Frequency Wedding 

The uncertainty principle states that the energy spread of a function 
and its Fourier transform cannot be simultaneously arbitrarily small. 
Motivated by quantum mechanics, in 1946 the physicist Gabor [187] 
defined elementary time-frequency atoms as waveforms that have a 
minimal spread in a time-frequency plane. To measure time-frequency 
“information” content, he proposed decomposing signals over these el- 
ementary atomic waveforms. By showing that such decompositions are 
closely related to our sensitivity to sounds, and that they exhibit im- 
portant structures in speech and music recordings, Gabor demonstrated 
the importance of localized time-frequency signal processing. 

Chapter 4 studies the properties of windowed Fourier and wavelet 
transforms, computed by decomposing the signal over different fami- 
lies of time-frequency atoms. Other transforms can also be defined by 
modifying the family of time-frequency atoms. A unified interpretation 
of local time-frequency decompositions follows the time-frequency en- 
ergy density approach of Ville. In parallel to Gabor’s contribution, in 
1948 Ville [342], who was an electrical engineer, proposed analyzing the 
time-frequency properties of signals / with an energy density defined 

by +oo 

Pvf(t.u) = jf~ / (i + T ~) r (t - 5 ) e-™ dr. 

Once again, theoretical physics was ahead, since this distribution had 
already been introduced in 1932 by Wigner [351] in the context of 
quantum mechanics. Chapter 4 explains the path that relates Wigner- 
Ville distributions to windowed Fourier and wavelet transforms, or any 
linear time-frequency transform. 
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1 . 2.1 Windowed Fourier Transform 

Gabor atoms are constructed by translating in time and frequency a 
time window g: 

9u,i{t) = g{t - u) e^. 

The energy of g u £ is concentrated in the neighborhood of u over an 
interval of size a t , measured by the standard deviation of \g\ 2 . Its 
Fourier transform is a translation by £ of the Fourier transform g of g\ 

gU 0 <‘ ( 1 - 5 ) 

The energy of g u ^ is therefore localized near the frequency £, over an 
interval of size which measures the domain where g(uj) is non- 
negiigible. In a time- frequency plane the energy spread of the 

atom g U £ is symbolically represented by the Heisenberg rectangle illus- 
trated by Figure 1.1. This rectangle is centered at (u, £) and has a time 
width (7 t and a frequency width a w . The uncertainty principle proves 
that its area satisfies 

1 

efiCG > 

This area is minimum when g is a Gaussian, in which case the atoms 
g U £ are called Gabor functions. 



jgy,/®) 1 






■ K 

ig u , 5 (t) i 



'G.v® 1 



Figure 1.1: Time-frequency boxes (“Heisenberg rectangles”) represent- 
ing the energy spread of two Gabor atoms. 
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The windowed Fourier transform defined by Gabor correlates a sig- 
nal / with each atom g u 

/ +oo r+oo 

/ f{t) g{t- u)e~ lit clt. (1.6) 

■00 J— oo 

It is a Fourier integral that is localized in the neighborhood of u by the 
window g(t — u). This time integral can also be written as a frequency 
integral by applying the Fourier Parseval formula (2.25): 

1 /»+oo 

Sf M = Y J fHg*u,n(u)ciuj. (i.7) 

The transform Sf(u,£) thus depends only on the values f(t ) and f(uj) 
in the time and frequency neighborhoods where the energies of g u ^ 
and g U £ are concentrated. Gabor interprets this as a “quantum of 
information” over the time-frequency rectangle illustrated in Figure 

1 . 1 . 

When listening to music, we perceive sounds that have a frequency 
that varies in time. Measuring time-varying harmonics is an important 
application of windowed Fourier transforms in both music and speech 
recognition. A spectral line of / creates high amplitude windowed 
Fourier coefficients Sf(u,£) at frequencies £(u) that depend on the 
time u. The time evolution of such spectral components is therefore 
analyzed by following the location of large amplitude coefficients. 

1.2.2 Wavelet Transform 

In reflection seismology, Morlet knew that the modulated pulses sent 
underground have a duration that is too long at high frequencies to 
separate the returns of fine, closely-spaced layers. Instead of emitting 
pulses of equal duration, he thus thought of sending shorter waveforms 
at high frequencies. Such waveforms are simply obtained by scaling a 
single function called a wavelet. Although Grossmann was working in 
theoretical physics, he recognized in Morlet’s approach some ideas that 
were close to his own work on coherent quantum states. Nearly forty 
years after Gabor, Morlet and Grossmann reactivated a fundamental 
collaboration between theoretical physics and signal processing, which 
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led to the formalization of the continuous wavelet transform [200]. Yet, 
these ideas were not totally new to mathematicians working in harmonic 
analysis, or to computer vision researchers studying multiscale image 
processing. It was thus only the beginning of a rapid catalysis that 
brought together scientists with very different backgrounds, first around 
coffee tables, then in more luxurious conferences. 

A wavelet U is a function of zero average: 




0, 



which is dilated with a scale parameter s, and translated by u: 







( 1 . 8 ) 



The wavelet transform of / at the scale s and position u is computed 
by correlating / with a wavelet atom 

wf(u, s) = j [ + “ /(<) A r (*-A\ ,it. (i.9) 



Time-Frequency Measurements Like a windowed Fourier trans- 
form, a wavelet transform can measure the time-frequency variations of 
spectral components, but it has a different time-frequency resolution. 
A wavelet transform correlates / with ip u ^ s . By applying the Fourier 
Parseval formula (2.25), it can also be written as a frequency integra- 
tion: 

/ — |— OO 1 s*-\-oo 

f{t ) 4i,sit) dt = — J f{u) U*,sH du} - (1.10) 

The wavelet coefficient Wf(u,s) thus depends on the values f(t) and 
in the time-frequency region where the energy of y'Vs a ll( l V’u.s is 
concentrated. Time varying harmonics are detected from the position 
and scale of high amplitude wavelet coefficients. 

In time, r ip. ll:S is centered at u with a spread proportional to s. Its 
Fourier transform is calculated from (1.8): 



VvsM = e muJ Vs iiisLu), 
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where 'W is the Fourier transform of U. To analyze the phase information 
of signals, a complex analytic wavelet is used. This means that 0(u;) = 
0 for u) < 0. Its energy is concentrated in a positive frequency interval 
centered at rj. The energy of (/'«,« (^) is therefore concentrated over 
a positive frequency interval centered at r\/s , whose size is scaled by 
1/s. In the time-frequency plane, a wavelet atom -w u .s is symbolically 
represented by a rectangle centered at ( u , r]/s). The time and frequency 
spread are respectively proportional to s- and 1/s. When s varies, the 
height and width of the rectangle change but its area remains constant, 
as illustrated by Figure 1.2. 






s 



\% s (©)l 

V “o’^o 



v u , s a 
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Figure 1.2: Time-frequency boxes of two wavelets (/> UiS and T U0 ^ 0 . When 
the scale s decreases, the time support is reduced but the frequency 
spread increases and covers an interval that is shifted towards high 
frequencies. 



Multiscale Zooming The wavelet transform can also detect and 
characterize transients with a zooming procedure across scales. Sup- 
pose that y' ! is real. Since it has a zero average, a wavelet coefficient 
W f(u, s ) measures the variation of / in a neighborhood of u whose size 
is proportional to s. Sharp signal transitions create large amplitude 
wavelet coefficients. Chapter 6 relates the pointwise regularity of / to 
the asymptotic decay of the wavelet transform Wf(u, s), when s goes 
to zero. Singularities are detected by following across scales the local 
maxima of the wavelet transform. In images, high amplitude wavelet 
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coefficients indicate the position of edges, which are sharp variations 
of the image intensity. Different scales provide the contours of image 
structures of varying sizes. Such multiscale edge detection is particu- 
larly effective for pattern recognition in computer vision [113]. 

The zooming capability of the wavelet transform not only locates 
isolated singular events, but can also characterize more complex mul- 
tifractal signals having non-isolated singularities. Mandelbrot [43] was 
the first to recognize the existence of multifractals in most corners of 
nature. Scaling one part of a multifractal produces a signal that is 
statistically similar to the whole. This self-similarity appears in the 
wavelet transform, which modifies the analyzing scale. /.From the global 
wavelet transform decay, one can measure the singularity distribution 
of multifractals. This is particularly important in analyzing their prop- 
erties and testing models that explain the formation of multifractals in 
physics. 



1.3 Bases of Time- Frequency Atoms 

The continuous windowed Fourier transform Sf(u , £) and the wavelet 
transform Wf(u,s ) are two-dimensional representations of a one-di- 
mensional signal /. This indicates the existence of some redundancy 
that can be reduced and even removed by subsampling the parameters 
of these transforms. 

Frames Windowed Fourier transforms and wavelet transforms can be 
written as inner products in L 2 (R) , with their respective time-frequency 
atoms 

/ +oo 

nt)g* u ^t)dt = {f, g u ^ 

-OO 

and 

/ +oo 

f( f ) </’«,«(*) dt = ( f , </’«,«)• 

-oo 

Subsampling both transforms defines a complete signal representation 
if any signal can be reconstructed from linear combinations of discrete 
families of windowed Fourier atoms {<7u„g fc }(n,/t)ez 2 an d wavelet atoms 




1.3. BASES OF TIME-FREQUENCY ATOMS 



29 



{' i Pu n ,s J }(j,n)e r i 2 - The frame theory of Chapter 5 discusses what condi- 
tions these families of waveforms must meet if they are to provide stable 
and complete representations. 

Completely eliminating the redundancy is equivalent to building 
a basis of the signal space. Although wavelet bases were the first to 
arrive on the research market, they have quickly been followed by other 
families of orthogonal bases, such as wavelet packet and local cosine 
bases. 

1.3.1 Wavelet Bases and Filter Banks 

In 1910, Haar [202] realized that one can construct a simple piecewise 
constant function 



1 if 0 < £ < 1/2 
4 it ) = { -1 if 1/2 < t < 1 
0 otherwise 



whose dilations and translations generate an orthonormal basis of 1/ 



, /x 1 (t-2?n 

c = —f= i 1 






21 



( j , n ) ez 2 



Any finite energy signal / can be decomposed over this wavelet orthog- 
onal basis A* 



+00 +00 

/ = (Til) 

j —— 00 n=— 00 

Since A(£) has a zero average, each partial sum 

+00 

dj ( t ) = ( fiUj , n ) ' 0 j,n(f) 

n=— 00 

can be interpreted as detail variations at the scale 2 :l . These layers of 
details are added at all scales to progressively improve the approxima- 
tion of /, and ultimately recover /. 
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If / has smooth variations, we should obtain a precise approximation 
when removing fine scale details, which is done by truncating the sum 
(1.11). The resulting approximation at a scale 2 J is 

+ OG 

fj(t) = 

i=J 

For a Haar basis, fj is piecewise constant. Piecewise constant ap- 
proximations of smooth functions are far from optimal. For example, a 
piecewise linear approximation produces a smaller approximation error. 
The story continues in 1980, when Stromberg [322] found a piecewise 
linear function -w that also generates an orthonormal basis and gives 
better approximations of smooth functions. Meyer was not aware of 
this result, and motivated by the work of Morlet and Grossmann he 
tried to prove that there exists no regular wavelet </’ that generates an 
orthonormal basis. This attempt was a failure since he ended up con- 
structing a whole family of orthonormal wavelet bases, with functions 
U that are infinitely continuously differentiable [270]. This was the fun- 
damental impulse that lead to a widespread search for new orthonormal 
wavelet bases, which culminated in the celebrated Daubechies wavelets 
of compact support [144], 

The systematic theory for constructing orthonormal wavelet bases 
was established by Meyer and Mallat through the elaboration of mul- 
tiresolution signal approximations [254], presented in Chapter 7. It 
was inspired by original ideas developed in computer vision by Burt 
and Adelson [108] to analyze images at several resolutions. Digging 
more into the properties of orthogonal wavelets and multiresolution 
approximations brought to light a surprising relation with filter banks 
constructed with conjugate mirror filters. 

Filter Banks Motivated by speech compression, in 1976 Croisier, 
Esteban and Galand [141] introduced an invertible filter bank, which 
decomposes a discrete signal f[n] in two signals of half its size, using a 
filtering and subsampling procedure. They showed that f[n] can be re- 
covered from these subsampled signals by canceling the aliasing terms 
with a particular class of filters called conjugate mirror filters. This 
breakthrough led to a 10-year research effort to build a complete filter 
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bank theory. Necessary and sufficient conditions for decomposing a sig- 
nal in subsampled components with a filtering scheme, and recovering 
the same signal with an inverse transform, were established by Smith 
and Barnwell [316], Vaidyanathan [336] and Vetterli [339]. 

The multiresolution theory of orthogonal wavelets proves that any 
conjugate mirror filter characterizes a wavelet r tp that generates an or- 
thonormal basis of L 2 (M). Moreover, a fast discrete wavelet transform 
is implemented by cascading these conjugate mirror filters. The equiv- 
alence between this continuous time wavelet theory and discrete filter 
banks led to a new fruitful interface between digital signal processing 
and harmonic analysis, but also created a culture shock that is not 
totally resolved. 



Continuous Versus Discrete and Finite Many signal processors 
have been and still are wondering what is the point of these continuous 
time wavelets, since all computations are performed over discrete sig- 
nals, with conjugate mirror filters. Why bother with the convergence 
of infinite convolution cascades if in practice we only compute a finite 
number of convolutions? Answering these important questions is nec- 
essary in order to understand why throughout this book we alternate 
between theorems on continuous time functions and discrete algorithms 
applied to finite sequences. 

A short answer would be “simplicity”. In L 2 (R), a wavelet basis 
is constructed by dilating and translating a single function '(/’• Sev- 
eral important theorems relate the amplitude of wavelet coefficients 
to the local regularity of the signal /. Dilations are not defined over 
discrete sequences, and discrete wavelet bases have therefore a more 
complicated structure. The regularity of a discrete sequence is not well 
defined either, which makes it more difficult to interpret the amplitude 
of wavelet coefficients. A theory of continuous time functions gives 
asymptotic results for discrete sequences with sampling intervals de- 
creasing to zero. This theory is useful because these asymptotic results 
are precise enough to understand the behavior of discrete algorithms. 

Continuous time models are not sufficient for elaborating discrete 
signal processing algorithms. Uniformly sampling the continuous time 
wavelets {'Uj. n (l ) h does not produce a discrete orthonormal ba- 
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sis. The transition between continuous and discrete signals must be 
done with great care. Restricting the constructions to finite discrete 
signals adds another layer of complexity because of border problems. 
How these border issues affect numerical implementations is carefully 
addressed once the properties of the bases are well understood. To 
simplify the mathematical analysis, throughout the book continuous 
time transforms are introduced first. Their discretization is explained 
afterwards, with fast numerical algorithms over finite signals. 

1.3.2 Tilings of Wavelet Packet and Local Cosine 
Bases 

Orthonormal wavelet bases are just an appetizer. Their construction 
showed that it is not only possible but relatively simple to build or- 
thonormal bases of L 2 (R) composed of local time-frequency atoms. The 
completeness and orthogonality of a wavelet basis is represented by a 
tiling that covers the time-frequency plane with the wavelets’ time- 
frequency boxes. Figure 1.3 shows the time-frequency box of each E :) . n . 
which is translated by 2%, with a time and a frequency width scaled 
respectively by 2 J and 2~T 

One can draw many other tilings of the time-frequency plane, with 
boxes of minimal surface as imposed by the uncertainty principle. Chap- 
ter 8 presents several constructions that associate large families of or- 
tlionormal bases of L 2 (M) to such new tilings. 

Wavelet Packet Bases A wavelet orthonormal basis decomposes 
the frequency axis in dyadic intervals whose sizes have an exponential 
growth, as shown by Figure 1.3. Coifman, Meyer and Wickerliauser 
[139] have generalized this fixed dyadic construction by decomposing 
the frequency in intervals whose bandwidths may vary. Each frequency 
interval is covered by the time-frequency boxes of wavelet packet func- 
tions that are uniformly translated in time in order to cover the whole 
plane, as shown by Figure 1.4. 

Wavelet packet functions are designed by generalizing the filter bank 
tree that relates wavelets and conjugate mirror filters. The frequency 
axis division of wavelet packets is implemented with an appropriate 
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sequence of iterated convolutions with conjugate mirror filters. Fast 
numerical wavelet packet decompositions are thus implemented with 
discrete filter banks. 







M Hi 



A 



j+i.p 
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Figure 1.3: The time-frequency boxes of a wavelet basis define a tiling 
of the time-frequency plane. 



Local Cosine Bases Orthonormal bases of L 2 (R) can also be con- 
structed by dividing the time axis instead of the frequency axis. The 
time axis is segmented in successive finite intervals [a p , ci p+ 1 ]. The local 
cosine bases of Malvar [262] are obtained by designing smooth windows 
g p (t) that cover each interval [a p . a v+ \ ] , and multiplying them by cosine 
functions cos(£f + <p ) of different frequencies. This is yet another idea 
that was independently studied in physics, signal processing and mathe- 
matics. Malvar’s original construction was done for discrete signals. At 
the same time, the physicist Wilson [353] was designing a local cosine 
basis with smooth windows of infinite support, to analyze the proper- 
ties of quantum coherent states. Malvar bases were also rediscovered 
and generalized by the harmonic analysts Coifman and Meyer [138]. 
These different views of the same bases brought to light mathematical 
and algorithmic properties that opened new applications. 

A multiplication by cos (£t + A) translates the Fourier transform 
g p (to) of g p (t) by ±£. Over positive frequencies, the time-frequency 
box of the modulated window g p (t) cos (££ + <p) is therefore equal to 
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Figure 1.4: A wavelet packet basis divides the frequency axis in separate 
intervals of varying sizes. A tiling is obtained by translating in time 
the wavelet packets covering each frequency interval. 



the time-frequency box of g p translated by £ along frequencies. The 
time-frequency boxes of local cosine basis vectors define a tiling of the 
time-frequency plane illustrated by Figure 1.5. 



1.4 Bases for What? 

The tiling game is clearly unlimited. Local cosine and wavelet packet 
bases are important examples, but many other kinds of bases can be 
constructed. It is thus time to wonder how to select an appropriate 
basis for processing a particular class of signals. The decomposition 
coefficients of a signal in a basis define a representation that highlights 
some particular signal properties. For example, wavelet coefficients 
provide explicit information on the location and type of signal singu- 
larities. The problem is to find a criterion for selecting a basis that is 
intrinsically well adapted to represent a class of signals. 

Mathematical approximation theory suggests choosing a basis that 
can construct precise signal approximations with a linear combination 
of a small number of vectors selected inside the basis. These selected 
vectors can be interpreted as intrinsic signal structures. Compact cod- 
ing and signal estimation in noise are applications where this criterion 
is a good measure of the efficiency of a basis. Linear and non-linear 
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Figure 1.5: A local cosine basis divides the time axis with smooth 
windows g p (t). Multiplications with cosine functions translate these 
windows in frequency and yield a complete cover of the time-frequency 
plane. 

procedures are studied and compared. This will be the occasion to 
show that non-linear does not always mean complicated. 



1.4.1 Approximation 

The development of orthonormal wavelet bases has opened a new bridge 
between approximation theory and signal processing. This exchange is 
not quite new since the fundamental sampling theorem comes from an 
interpolation theory result proved in 1935 by Whittaker [349]. However, 
the state of the art of approximation theory has changed since 1935. 
In particular, the properties of non-linear approximation schemes are 
much better understood, and give a firm foundation for analyzing the 
performance of many non-linear signal processing algorithms. Chapter 
9 introduces important approximation theory results that are used in 
signal estimation and data compression. 
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Linear Approximation A linear approximation projects the signal 
/ over M vectors that are chosen a priori in an orthonormal basis 
B = {9m} mein say the first M: 

M—l 

}m = ^2 {f, 9m) 9m- (1-12) 

m= 0 

Since the basis is orthonormal, the approximation error is the sum of 
the remaining squared inner products 

+oo 

e[m] = ii/-/„ii 2 = V k/.j,„)| 2 . 

m=M 

The accuracy of this approximation clearly depends on the properties 
of / relative to the basis B. 

A Fourier basis yields efficient linear approximations of uniformly 
smooth signals, which are projected over the M lower frequency sinu- 
soidal waves. When M increases, the decay of the error e[M] can be 
related to the global regularity of /. Chapter 9 characterizes spaces of 
smooth functions from the asymptotic decay of e[M] in a Fourier basis. 

In a wavelet basis, the signal is projected over the M larger scale 
wavelets, which is equivalent to approximating the signal at a fixed res- 
olution. Linear approximations of uniformly smooth signals in wavelet 
and Fourier bases have similar properties and characterize nearly the 
same function spaces. 

Suppose that we want to approximate a class of discrete signals of 
size N , modeled by a random vector F[n\. The average approximation 
error when projecting F over the first M basis vectors of an orthonormal 
basis B = {, g m }o< m<N is 

:,V I 

= E{||F - Fm || 2 } = X E{|(F, Sra )| 2 }. 

m=M 

Chapter 9 proves that the basis that minimizes this error is the Karhunen- 
Loeve basis, which diagonalizes the covariance matrix of F. This re- 
markable property explains the fundamental importance of the Ivarhunen- 
Loeve basis in optimal linear signal processing schemes. This is however 
only a beginning. 
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Non-linear Approximation The linear approximation (1.12) is im- 
proved if we choose a posteriori the M vectors g m , depending on /. The 
approximation of / with M vectors whose indexes are in I M is 

I'm = (/’ 9m) 9m- (1-13) 

mE/ M 

The approximation error is the sum of the squared inner products with 
vectors not in I M '■ 

m = \\s - h<r = E 

n0M 

To minimize this error, we choose lu to be the set of M vectors that 
have the largest inner product amplitude \ {f,g m )\- This approximation 
scheme is non-linear because the approximation vectors change with /. 

The amplitude of inner products in a wavelet basis is related to the 
local regularity of the signal. A non-linear approximation that keeps the 
largest wavelet inner products is equivalent to constructing an adaptive 
approximation grid, whose resolution is locally increased where the sig- 
nal is irregular. If the signal has isolated singularities, this non-linear 
approximation is much more precise than a linear scheme that main- 
tains the same resolution over the whole signal support. The spaces 
of functions that are well approximated by non-linear wavelet schemes 
are thus much larger than for linear schemes, and include functions 
with isolated singularities. Bounded variation signals are important 
examples that provide useful models for images. 

In this non-linear setting, Ivarhunen-Loeve bases are not optimal for 
approximating the realizations of a process F. It is often easy to find a 
basis that produces a smaller non-linear error than a Ivarhunen-Loeve 
basis, but there is yet no procedure for computing the optimal basis 
that minimizes the average non-linear error. 

Adaptive Basis Choice Approximations of non-linear signals can 
be improved by choosing the approximation vectors in families that are 
much larger than a basis. Music recordings, which include harmonic 
and transient structures of very different types, are examples of complex 
signals that are not well approximated by a few vectors chosen from a 
single basis. 
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A new degree of freedom is introduced if instead of choosing a priori 
the basis B , we adaptively select a “best” basis, depending on the signal 
/. This best basis minimizes a cost function related to the non-linear 
approximation error of /. A fast dynamical programming algorithm 
can find the best basis in families of wavelet packet basis or local cosine 
bases [140]. The selected basis corresponds to a time-frequency tiling 
that “best” concentrates the signal energy over a few time-frequency 
atoms. 

Orthogonality is often not crucial in the post-processing of signal 
coefficients. One may thus further enlarge the freedom of choice by ap- 
proximating the signal / with M non-orthogonal vectors {g rm }o< m <M, 
chosen from a large and redundant dictionary V = { ry. }., f , : 



M—l 

f M — y J ( hn 97„ 



E 

m = 0 



Globally optimizing the choice of these M vectors in V can lead to 
a combinatorial explosion. Chapter 9 introduces sub-optimal pursuit 
algorithms that reduce the numerical complexity, while constructing- 
efficient approximations [119, 259]. 



1.4.2 Estimation 

The estimation of a signal embedded in noise requires taking advantage 
of any prior information about the signal and the noise. Chapter 10 
studies and contrasts several approaches: Bayes versus minimax, lin- 
ear versus non-linear. Until recently, signal processing estimation was 
mostly Bayesian and linear. Non-linear smoothing algorithms existed 
in statistics, but these procedures were often ad-hoc and complex. Two 
statisticians, Donoho and Johnstone [167], changed the game by prov- 
ing that a simple thresholding algorithm in an appropriate basis can be 
a nearly optimal non-linear estimator. 



Linear versus Non-Linear A signal f[n] of size N is contaminated 
by the addition of a noise. This noise is modeled as the realization of 
a random process TT[n], whose probability distribution is known. The 
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measured data are 

X[n\ = f[n\ + W[n\ . 

The signal / is estimated by transforming the noisy data X with an 
operator D: 

F = DX . 

The risk of the estimator F of / is the average error, calculated with 
respect to the probability distribution of the noise W: 

r(DJ) = E{\\f-DX\\ 2 } . 



It is tempting to restrict oneself to linear operators D , because 
of their simplicity. Yet, non-linear operators may yield a much lower 
risk. To keep the simplicity, we concentrate on diagonal operators in 
a basis B. If the basis B gives a sparse signal representation, Donoho 
and Johnstone [167] prove that a nearly optimal non-linear estimator 
is obtained with a simple thresholding: 

N - 1 

F = DX = X Pt({X , g m ) ) g m . 
m— o 



The thresholding function pr(x) sets to zero all coefficients below T: 



p T {x) 



f° 


if .r 




if \x\ 



< T 
> T ' 



In a wavelet basis, such a thresholding implements an adaptive smooth- 
ing, which averages the data X with a kernel that depends on the 
regularity of the underlying signal /. 



Bayes Versus Minimax To optimize the estimation operator D , 
one must take advantage of any prior information available about the 
signal /. In a Bayes framework, / is considered as a realization of a 
random vector F. whose probability distribution 7 r is known a priori. 
Thomas Bayes was a XVII century philosopher, who first suggested 
and investigated methods sometimes referred as “inverse probability 
methods,” which are basic to the study of Bayes estimators. The Bayes 
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risk is the expected risk calculated with respect to the prior probability 
distribution 7 r of the signal: 

r(D,ir) = E^{r{D,F)}. 

Optimizing D among all possible operators yields the minimum Bayes 
risk: 

r n{v) = inf r(D,n) . 

all D 

Complex signals such as images are clearly non-Gaussian, and there 
is yet no reliable probabilistic model that incorporates the diversity of 
structures such as edges and textures. 

In the 1940’s, Wald brought a new perspective on statistics, through 
a decision theory partly imported from the theory of games. This point 
of view offers a simpler way to incorporate prior information on complex 
signals. Signals are modeled as elements of a particular set 0, without 
specifying their probability distribution in this set. For example, large 
classes of images belong to the set of signals whose total variation is 
bounded by a constant. To control the risk for any / G 0, we compute 
the maximum risk 

r(D. 0) = sup r{D , /) . 
fee 

The minimax risk is the lower bound computed over all operators D: 

r„,(0) = inf r{D,0). 

all D 

In practice, the goal is to find an operator D that is simple to implement 
and which yields a risk close the minimax lower bound. 

Unless 0 has particular convexity properties, non-linear estimators 
have a much lower risk than linear estimators. If IT is a white noise 
and signals in © have a sparse representation in B , then Chapter 10 
shows that thresholding estimators are nearly minimax optimal. In 
particular, the risk of wavelet thresholding estimators is close to the 
minimax risk for wide classes of piecewise smooth signals, including 
bounded variation images. Thresholding estimators are extended to 
more complex problems such as signal restorations and deconvolutions. 
The performance of a thresholding may also be improved with a best 
basis search or a pursuit algorithm that adapts the basis B to the noisy 
data. However, more adaptivity does not necessarily means less risk. 
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1.4.3 Compression 

Limited storage space and transmission through narrow band-width 
channels create a need for compressing signals while minimizing their 
degradation. Transform codes compress signals by decomposing them 
in an orthonormal basis. Chapter 11 introduces the basic information 
theory needed to understand these codes and optimize their perfor- 
mance. Bayes and minimax approaches are studied. 

A transform code decomposes a signal / in an orthonormal basis 
B — {9m-}o<m.<N- 

N - 1 

/ = ^ ] (/) 9m) 9m ■ 

m = 0 

The coefficients (/, g m ) are approximated by quantized values Q{{f, g m )) 
A signal / is restored from these quantized coefficients: 



N - 1 

f = ^2 Q ((f, 9 m)) 9 m- 

m = 0 

A binary code is used to record the quantized coefficients Q{{f,g m )) 
with R bits. The resulting distortion is 

d(RJ) = \\f-f \\ 2 - 

At the compression rates currently used for images, d(R, f ) has a highly 
non-linear behavior, which depends on the precision of non-linear ap- 
proximations of / from a few vectors in the basis B. 

To compute the distortion rate over a whole signal class, the Bayes 
framework models signals as realizations of a random vector F whose 
probability distribution 7r is known. The goal is then to optimize the 
quantization and the basis B in order to minimize the average distortion 
rate d(R, n) = E 7T {d(R, F)}. This approach applies particularly well to 
audio signals, which are relatively well modeled by Gaussian processes. 

In the absence of stochastic models for complex signals such as 
images, the minimax approach computes the maximum distortion by 
assuming only that the signal belongs to a prior set 0. Chapter 11 
describes the implementation of image transform codes in wavelet bases 
and block cosine bases. The minimax distortion rate is calculated for 
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bounded variation images, and wavelet transform codes are proved to 
be nearly minimax optimal. 

For video compression, one must also take advantage of the sim- 
ilarity of images across time. The most effective algorithms predict 
each image from a previous one by compensating for the motion, and 
the error is recorded with a transform code. MPEG video compression 
standards are described. 



1.5 Travel Guide 

1.5.1 Reproducible Computational Science 

The book covers the whole spectrum from theorems on functions of 
continuous variables to fast discrete algorithms and their applications. 
Section 1.3.1 argues that models based on continuous time functions 
give useful asymptotic results for understanding the behavior of dis- 
crete algorithms. Yet, a mathematical analysis alone is often unable 
to predict fully the behavior and suitability of algorithms for specific 
signals. Experiments are necessary and such experiments ought in prin- 
ciple be reproducible, just like experiments in other fields of sciences. 

In recent years, the idea of reproducible algorithmic results has been 
championed by Claerbout [127] in exploration geophysics. The goal of 
exploration seismology is to produce the highest possible quality image 
of the subsurface. Part of the scientific know-how involved includes ap- 
propriate parameter settings that lead to good results on real datasets. 
The reproducibility of experiments thus requires having the complete 
software and full source code for inspection, modification and applica- 
tion under varied parameter settings. 

Donoho has advocated the reproducibility of algorithms in wavelet 
signal processing, through the development of a WaveLab toolbox, 
which is a large library of Matlab routines. He summarizes Claer- 
bout’s insight in a slogan: [105] 

An article about computational science in a scientific pub- 
lication is not the scholarship itself, it is merely advertising 
of the scholarship. The actual scholarship is the complete 
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software environment and the complete set of instructions 
which generated the figures. 

Following this perspective, all wavelet and time-frequency tools pre- 
sented in this book are available in WaveLab. The figures can be 
reproduced as demos and the source code is available. The LastWave 
package offers a similar library of wavelet related algorithms that are 
programmed in C, with a user-friendly shell interface and graphics. 
Appendix B explains how to retrieve these toolboxes, and relates their 
subroutines to the algorithms described in the book. 

1.5.2 Road Map 

Sections are kept as independent as possible, and some redundancy is 
introduced to avoid imposing a linear progression through the book. 
The preface describes several possible paths for a graduate signal pro- 
cessing or an applied mathematics course. A partial hierarchy between 
sections is provided by a level number. If a section has a level number 
then all sub-sections without number inherit this level, but a higher 
level number indicates that a subsection is more advanced. 

Sections of level 1 introduce central ideas and techniques for wavelet 
and time-frequency signal processing. These would typically be taught 
in an introductory course. The first sections of Chapter 7 on wavelet 
orthonormal bases are examples. Sections of level 2 concern results that 
are important but which are either more advanced or dedicated to an 
application. Wavelet packets and local cosine bases in Chapter 8 are 
of that sort. Applications to estimation and data compression belong 
to this level, including fundamental results such as Wiener filtering. 
Sections of level 3 describe advanced results that are at the frontier 
of research or mathematically more difficult. These sections open the 
book to open research problems. 

All theorems are explained in the text and reading the proofs is 
not necessary to understand the results. Proofs also have a level in- 
dex specifying their difficulty, as well as their conceptual or technical 
importance. These levels have been set by trying to answer the ques- 
tion: “Should this proof be taught in an introductory course ?” Level 1 
means probably, level 2 probably not, level 3 certainly not. Problems 
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at the end of each chapter follow this hierarchy of levels. Direct ap- 
plications of the course are at the level 1 . Problems at level 2 require 
more thinking. Problems of level 3 are often at the interface of research 
and can provide topics for deeper projects. 

The book begins with Chapters 2 and 3, which review the Fourier 
transform properties and elementary discrete signal processing. They 
provide the necessary background for readers with no signal process- 
ing experience. Fundamental properties of local time-frequency trans- 
forms are presented in Chapter 4. The wavelet and windowed Fourier 
transforms are introduced and compared. The measurement of instan- 
taneous frequencies is used to illustrate the limitations of their time- 
frequency resolution. Wigner-Ville time-frequency distributions give a 
global perspective which relates all quadratic time-frequency distribu- 
tions. Frame theory is explained in Chapter 5. It offers a flexible frame- 
work for analyzing the properties of redundant or non-linear adaptive 
decompositions. Chapter 6 explains the relations between the decay of 
the wavelet transform amplitude across scales and local signal proper- 
ties. It studies applications involving the detection of singularities and 
analysis of multifractals. 

The construction of wavelet bases and their relations with filter 
banks are fundamental results presented in Chapter 7. An overdose of 
orthonormal bases can strike the reader while studying the construction 
and properties of wavelet packets and local cosine bases in Chapter 8. 
It is thus important to read in parallel Chapter 9, which studies the 
approximation performance of orthogonal bases. The estimation and 
data compression applications of Chapters 10 and 11 give life to most 
theoretical and algorithmic results of the book. These chapters offer 
a practical perspective on the relevance of these linear and non-linear 
signal processing algorithms. 




Chapter 2 



Fourier Kingdom 



The story begins in 1807 when Fourier presents a memoir to the In- 
stitut de France, where he claims that any periodic function can be 
represented as a series of harmonically related sinusoids. This idea had 
a profound impact in mathematical analysis, physics and engineering, 
but it took one and a half centuries to understand the convergence of 
Fourier series and complete the theory of Fourier integrals. 

Fourier was motivated by the study of heat diffusion, which is gov- 
erned by a linear differential equation. However, the Fourier transform 
diagonalizes all linear time-invariant operators, which are the building 
blocks of signal processing. It is therefore not only the starting point 
of our exploration but the basis of all further developments. 



2.1 Linear Time-Invariant Filtering 1 

Classical signal processing operations such as signal transmission, sta- 
tionary noise removal or predictive coding are implemented with linear 
time-invariant operators. The time invariance of an operator L means 
that if the input /(f) is delayed by r, / r (f) = f{t — r), then the output 
is also delayed by r: 

g{t) = Lf{t) => g(t — t) = Lf T (t). (2.1) 

For numerical stability, the operator L must have a weak form of con- 
tinuity, which means that Lf is modified by a small amount if / is 
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slightly modified. This weak continuity is formalized by the theory of 
distributions [66, 69], which guarantees that we are on a safe ground 
without further worrying about it. 

2.1.1 Impulse Response 

Linear time-invariant systems are characterized by their response to a 
Dirac impulse, defined in Appendix A. 7. If / is continuous, its value 
at t is obtained by an “integration” against a Dirac located at t. Let 
S u (t) = 5(t - u): 

/ +oo 

f{u)S u {t) clu. 

■00 

The continuity and linearity of L imply that 

/ +oo 

f (u) LS u (t) du. 

■00 

Let h be the impulse response of L: 

h(t) = LS(t). 

The time-invariance proves that L8 u (t ) = h(t — u) and hence 

/ +oo f+OO 

f(u)h(t — u)du= / h(u)f(t — u)du = h-kf(t). (2.2) 

-oo J — oo 

A time-invariant linear filter is thus equivalent to a convolution with 
the impulse response h. The continuity of / is not necessary. This 
formula remains valid for any signal / for which the convolution integral 
converges. 

Let us recall a few useful properties of convolution products: 

• Commutativity 

f*h{t) = h*f{t). (2.3) 

• Differentiation 

Jt U * h){t) = ft = /*§«■ (2-1) 

• Dirac convolution 

f -kS T (t) = f(t — t). (2.5) 
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Stability and Causality A filter is said to be causal if Lf(t ) does 
not depend on the values f(u) for u > t. Since 

/ +oo 

h{u) f{t - u ) du, 

■00 

this means that h(u) = 0 for u < 0. Such impulse responses are said to 
be causal. 

The stability property guarantees that Lf(t) is bounded if f(t ) is 
bounded. Since 

/ +oo f+OO 

\h{u)\ \f(t- u) I clu < sup \f(u)\ / \h(u)\clu, 

-OO UEM J — OO 

it is sufficient that \h(u)\ du < +oo. One can verify that this con- 
dition is also necessary if h is a function. We thus say that h is stable 
if it is integrable. 

Example 2.1 An amplification and delay system is defined by 

Lf{t) = Xf(t-r). 

The impulse response of this filter is h(t) = A S(t — r). 

Example 2.2 A uniform averaging of / over intervals of size T is 
calculated by 

1 rt+T/2 

Lf{t) = - f{u)du. 

1 Jt-T/2 

This integral can be rewritten as a convolution of / with the impulse 
response h = j. 1[_ t / 2 ,t/2]- 

2.1.2 Transfer Functions 

Complex exponentials c ,ujl are eigenvectors of convolution operators. 
Indeed 
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which yields 



/ +OO 

h(u) e~ iwu du = h{cu) eH 

■00 

The eigenvalue 

/ +oo 

h{u) e~ iwu du 

■OO 

is the Fourier transform of h at the frequency co. Since complex sinu- 
soidal waves e wt are the eigenvectors of time-invariant linear systems, 
it is tempting to try to decompose any function / as a sum of these 
eigenvectors. We are then able to express Lf directly from the eigen- 
values h(uo). The Fourier analysis proves that under weak conditions 
on /, it is indeed possible to write it as a Fourier integral. 

2.2 Fourier Integrals 1 

To avoid convergence issues, the Fourier integral is first defined over 
the space L 1 (M) of integrable functions [57]. It is then extended to the 
space L 2 (M) of finite energy functions [24], 

2.2.1 Fourier Transform in L^R) 

The Fourier integral 

/ +oo 

dt (2.6) 

-OO 

measures “how much” oscillations at the frequency co there is in /. If 
/ € L 1 (R) this integral does converge and 

/ +oo 

|/(t)| dt < +oo. (2.7) 

•OO 

The Fourier transform is thus bounded, and one can verify that it is 
a continuous function of co (Problem 2.1). If / is also integrable, the 
following theorem gives the inverse Fourier transform. 
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Theorem 2.1 (Inverse Fourier Transform) If f e L X (M) and f G 

L 1 (M) 2/ien 

1 C + OO 

/(t) = — / (2.8) 



Proof 2 . Replacing /(w) by its integral expression yields 

^ />+oo ^ ^ /"+oo / r+oo \ 

— / /(w) exp (iut) du = — ( / /(u) exp[*n>(i — it)] du ) du. 

27r 2-00 2 7r 2-00 \2-oo / 

We cannot apply the Fubini Theorem A. 2 directly because f(u) exp[*n>(i— 
u)] is not integrable in R 2 . To avoid this technical problem, we multiply 
by exp(— e 2 cc 2 /4) which converges to 1 when e goes to 0. Let us define 

h{t) = 2 ^ [ ■ft' 1 ') ex P ( — — ) ex PM* - «)] duj du. 

(2.9) 

We compute I e in two different ways using the Fubini theorem. The 
integration with respect to u gives 



^ r+oc / (P(jJ 2 \ 

I f (t ) = — J f(aj) exp ( — — — J n ex~p(iut) du. 



Since 



/M exp 



—e 2 u 2 



exp (iu(t-u)) < \f{u)\ 



and since / is integrable, we can apply the dominated convergence The- 
orem A.l, which proves that 

^ r + oo 

lim I e (t) = — / / (u) expUut) du . (2.10) 

e-^0 2 tt J_ 00 

Let us now compute the integral (2.9) differently by applying the Fubini 
theorem and integrating with respect to u: 



/ -\-oo 

9e{t~u)f{u) 

-oo 

gRx) = J exp(mj) exp 



( 2 . 11 ) 



— e 2 o; 2 
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A change of variable c J = euj shows that g e (x) = e~ 1 g\(e~ 1 x), and it is 
proved in (2.32) that gi(x) = 7r -1 / 2 e _,c . The Gaussian g\ has an integral 
equal to 1 and a fast decay. The squeezed Gaussians g e have an integral 
that remains equal to 1, and thus they converge to a Dirac 5 when e goes 
to 0. By inserting (2.11) one can thus verify that 



lim 

f— >o 




I h(t) - f{t ) | dt = lim 



II 9 ‘ lt 



— u 



| f(u) — f{t) | du dt = 0. 



Inserting (2.10) proves (2.8). ■ 

The inversion formula (2.8) decomposes / as a sum of sinusoidal waves 
e iwt Q f am piitude f(ui). By using this formula, we can show (Problem 
2.1) that the hypothesis / G L 1 (M) implies that / must be continuous. 
The reconstruction (2.8) is therefore not proved for discontinuous func- 
tions. The extension of the Fourier transform to the space L 2 (M) will 
address this issue. 

The most important property of the Fourier transform for signal 
processing applications is the convolution theorem. It is another way to 
express the fact that sinusoidal waves e ltu) are eigenvalues of convolution 
operators. 



Theorem 2.2 (Convolution) Let f E L 1 (M) and h E L 1 (M). The 

function g = h-k f is in L 1 (R) and 

c/{uj) = h(u)f{w). (2.12) 

Proof 1 . 

/ +oo / r+ oo \ 

exp(— itix) i f(t — u) h(u ) du ) dt. 

-oo \J — oo / 



Since \f (t—u)\\h(u)\ is integrable in M 2 , we can apply the Fubini Theorem 
A. 2, and the change of variable (t, u) — >■ (v = t — u,u) yields 



/ +oo /»+ OO 

/ exp[— i(u + v)(E\ f(v) h(u) dudv 

-oo J — oo 

/ p + oo \ / p+oo 

/ exp(— ivoo) f(v) dv J I / exp (—iuco) h(u) du 
— oo / \J — oo 



which verifies (2.12). 
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The response Lf = g = f * h of a linear time-invariant system can be 
calculated from its Fourier transform g(uj) = f(uj) h(co) with the inverse 
Fourier formula 

m = T,„ 

which yields 

i p-\-oo 

Lfit) = ;r- / h{u) f(u)e wt clu. (2.14) 

Each frequency component e' ,aJ of amplitude f (to) is amplified or atten- 
uated by h(u). Such a convolution is thus called a, frequency filtering, 
and h is the transfer function of the filter. 

The following table summarizes important properties of the Fourier 
transform, often used in calculations. Most of these formulas are proved 
with a change of variable in the Fourier integral. 



g(u) U ut du, 



(2.13) 



Property 


Function 

fit) 


Fourier Transform 

/M 


Inverse 


m 


2nf{-u) 


(2,15) 


Convolution 


fi*h(t) 


AM AM 


(2.16) 


Multiplication 


hit) hit) 




(2,17) 


Translation 


fit ~ t 0 ) 


e~ itoul f{u) 


(2.18) 


Modulation 


e iwot fit ) 


f{uj - UJ 0 ) 


(2.19) 


Scaling- 


/(f) 


|s| f(su) 


(2.20) 


Time derivatives 


f iP Ht) 


{iu) p f{u) 


(2.21) 


Frequency derivatives 


i-uyfit) 


f ip) i w) 


(2.22) 


Complex conjugate 


nt) 


f*i~u) 


(2.23) 


Hermitian symmetry 


fit) G M 


fi-uj) = f*{u) 


(2.24) 



2.2.2 Fourier Transform in L 2 (R) 

The Fourier transform of the indicator function / 



/ M 



e >xi dt 



2 sin u 






-l 



LO 
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This function is not integrable because / is not continuous, but its 
square is integrable. The inverse Fourier transform Theorem 2.1 thus 
does not apply. This motivates the extension of the Fourier transform 
to the space L 2 (M) of functions / with a finite energy \f(t)\ 2 clt < 
Too. By working in the Hilbert space L 2 (R), we also have access to all 
the facilities provided by the existence of an inner product. The inner 
product of f e L 2 (M) and g G L 2 (M) is 

/ +oo 

f{t,)g*{t)dt, 

•OO 

and the resulting norm in L 2 (R) is 

/ +oo 

I m\ 2 dt. 

•oo 

The following theorem proves that inner products and norms in L 2 (M) 
are conserved by the Fourier transform up to a factor of 2i r. Equations 
(2.25) and (2.26) are called respectively the Parseval and Plancherel 
formulas. 



Theorem 2.3 If f and h are in L 1 (M) n L 2 (R) then 




f(t ) h*(t) dt 



1 

2n 



+oo 



f(co) h*(uj) duj. 



For h = f it follows that 




m\ 2 dt 



i 

2n 




|/(w)| 2 duj. 



(2.25) 



(2.26) 



Proof 1 . Let g = f-kh with h(t) = h*(—t). The convolution Theorem 2.2 
and property (2.23) show that g(co) = f(u)) h*(co). The reconstruction 
formula (2.8) applied to #(0) yields 




f(t)h*(t) 



dt = g{ 0) 




f(uj) h*(co) duo. 
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Density Extension in L 2 (M) If / G L 2 (M) but / ^ L 1 (M), its 
Fourier transform cannot be calculated with the Fourier integral (2.6) 
because f(t) e lLOt is not integrable. It is defined as a limit using the 
Fourier transforms of functions in L 1 (M) n L 2 (M). 

Since L 1 (E) nL 2 (M) is dense in L 2 (M) , one can find a family {f n }ne z 
of functions in L 1 (M) ft L 2 (M) that converges to /: 

lim ||/ — /rill = 0. 

n— >+oo 

Since {f n }nei converges, it is a Cauchy sequence, which means that 
||/n ~ fp II is arbitrarily small if n and p are large enough. Moreover, 
f n G L 1 (R), so its Fourier transform f n is well defined. The Plancherel 
formula (2.26) proves that {f n } n e z is also a Cauchy sequence because 

II fn - fp\\ = V27T \ \f n - f p \\ 

is arbitrarily small for n and p large enough. A Hilbert space (Appendix 
A. 2) is complete, which means that all Cauchy sequences converge to 
an element of the space. Hence, there exists / G L 2 (M) such that 

lim \\f-fn\\ = 0. 

n— >+oo 



By definition, / is the Fourier transform of /. This extension of the 
Fourier transform to L 2 (M) satisfies the convolution theorem, the Par- 
seval and Plancherel formulas, as well as all properties (2.15-2.24). 



Diracs Diracs are often used in calculations; their properties are sum- 
marized in Appendix A. 7. A Dirac S associates to a function its value 
at t = 0. Since c ,ujl = 1 at t = 0 it seems reasonable to define its Fourier 
transform by 

/ +oo 

8{t)e~ i “ t dt = 1. (2.27) 

•OO 

This formula is justified mathematically by the extension of the Fourier 
transform to tempered distributions [66, 69]. 
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2.2.3 Examples 

The following examples often appear in Fourier calculations. They also 
illustrate important Fourier transform properties. 



• The indicator function f = 1 is discontinuous at t = ±T. 
Its Fourier transform is therefore not integrable: 



/ M 




2 sin(Tcij) 

LO 



(2.28) 



• An ideal low-pass filter has a transfer function h = l[-£g] that 
selects low frequencies over [—£,£]. The impulse response is cal- 
culated with the inverse Fourier integral (2.8): 



h{t) 




sin (ft) 
Tit 



(2.29) 



• A passive electronic circuit implements analog filters with resis- 
tances, capacities and inductors. The input voltage f(t.) is related 
to the output voltage g(t ) by a differential equation with constant 
coefficients: 

K M 

X>/ (fc) (*) = X>0 (fc) (*)- (2-30) 

k=0 k = 0 

Suppose that the circuit is not charged for t < 0, which means 
that /(f) = g(t ) = 0. The output g is a linear time-invariant 
function of / and can thus be written g = f -kh. Computing the 
Fourier transform of (2.30) and applying (2.22) proves that 



h(u) = !Kl = EUlMKi . 



(2.31) 



It is therefore a rational function of no. An ideal low-pass transfer 
function thus cannot be implemented by an analog circuit. 

It must be approximated by a rational function. Chebyshev or 
Butterworth filters are often used for this purpose [14], 
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• A Gaussian f(t ) = e ' 2 is a C°° function with a fast asymptotic 
decay. Its Fourier transform is also a Gaussian: 

f(cu) = \AFe- w2/4 . (2.32) 



This Fourier transform is computed by showing with an integra- 
tion by parts that f(uj) = e~^e~ 1ult dt is differentiable and 
satisfies the differential equation 



2/'(a>) + w/M = 0. (2.33) 

~ a ; 2 

The solution of this equation is a Gaussian f(uj) = JiVT, and 
since /( 0) = f^e~ f2 clt = y/ic, we obtain (2.32). 

• A Gaussian chirp f(t ) = exp [— (a — ib)t 2 ] has a Fourier transform 
calculated with a similar differential equation: 



/ M 



7T 



a — ib 



exp 



— (a + ib)uj 2 \ 
4(a 2 + b 2 ) ) ' 



(2.34) 



• A translated Dirac S T (t) = S(t — r) has a Fourier transform cal- 
culated by evaluating e at t = t: 



5 r (uj) = S(t — t) e 



—iwt 



clt = e~ luJT . 



(2.35) 



• The Dirac comb is a sum of translated Diracs 

+oo 

°{t) = 53 w - nT ) 

n =— oo 

that is used to uniformly sample analog signals. Its Fourier trans- 
form is derived from (2.35): 

+oo 

c{co)= J3 e- inTw . (2.36) 

n=— oo 

The Poisson formula proves that it is also equal to a Dirac comb 
with a spacing equal to 2w/T. 
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Theorem 2.4 (Poisson Formula) In the sense of distribution equal- 
ities (A. 32), 



+oo 

^ g — inTui 

n =— oo 




(2.37) 



Proof 2 . The Fourier transform c in (2.36) is periodic with period 2i t/T. 
To verify the Poisson formula, it is therefore sufficient to prove that the 
restriction of c to [—tt/T,tt/T] is equal to 27 t/T d. The formula (2.37) is 
proved in the sense of a distribution equality (A. 32) by showing that for 
any test function <p(uj) with a support included in [— 7r/T, 7 t/T], 



(c, d>) = lim 

N ' A'— S -+00 /_ 



x 



N 



-OO ^ 2yj- 

exp(— inToj) du = — 4>{0). 
50 n=—N T 



The sum of the geometric series is 



N 

exp(— inToj) 

n~—N 



sin[(iV + l/2)Tc;] 
sin[Tct>/2] 



(2.38) 



Hence 



(c, 0) = lim XX f 
N — >+oo T J 



r / T sin[(iV + l/2)Tw] Too / 2 



-7I-/T 



1T(jO 



sin[To;/2] 



4>{oj)doj. (2.39) 



Let 



'0M = 



1 


Tlo/2 


if u; 


sin[Tw/2] 


1 0 




if w 



and '</>(£) be the inverse Fourier transform of 'f(co). Since 2w 1 sin(<m>) is 
the Fourier transform of l[_ a , a ](t), the Parseval formula (2.25) implies 



{c,(f>) = lim 

N — »+oo 1 l_ 



2vr / ,+0 ° sin[(AT + l/2)Tw] 



/ 



TTU) 



Tp{ LO) diO 



= lim 

AT— S-+OO 



27r r^+i/2)T 



2tt r 

~r J_ 



'ip{t) dt. 



I-(N+1/2)T 

When N goes to +oo the integral converges to 0) = <^>(0). 



(2.40) 
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2.3 Properties 1 



2.3.1 Regularity and Decay 



The global regularity of a signal / depends on the decay of |/(a/)| when 
the frequency lo increases. The differentiability of / is studied. If 
/ G L 1 (M), then the Fourier inversion formula (2.8) implies that / is 
continuous and bounded: 



\m\< 



i 

27T 




e luJt f(u))\ duj 



1 

2n 



" + OO 



\f{u)) \ cIlo < +oo 



(2.41) 



The next proposition applies this property to obtain a sufficient condi- 
tion that guarantees the differentiability of / at any order p. 



Proposition 2.1 A function f is bounded and p times continuously 
differentiable with bounded derivatives if 



+oo 



\f{a>)\ (1 + M p ) cIlv < +oo 



(2.42) 



Proof 2 . The Fourier transform of the k th order derivative f^ k \t) is 
(iuj) k f(uj). Applying (2.41) to this derivative proves that 



/ +oo 

I/Ml M* dw. 

-oo 



Condition (2.42) implies that \f (co)\\Lo\ k dco < +oo for any k < p, 
so f (k Ht) is continuous and bounded. ■ 



This result proves that if there exist a constant K and e > 0 such that 



l/M < 



I< 

1 + \uj\P+ 1+e ' 



then 



fe CP 



If / has a compact support then (2.42) implies that / G C°°. 

The decay of |/(o/)| depends on the worst singular behavior of /. 
For example, / = 1 [~t,t\ is discontinuous at t = ±T, so |/(o/)| decays 
like |o/| _1 . In this case, it could also be important to know that f(t.) 
is regular for t. ^ ±T. This information cannot be derived from the 
decay of \f(uj)\. To characterize local regularity of a signal / it is 
necessary to decompose it over waveforms that are well localized in 
time, as opposed to sinusoidal waves e iwt . Section 6.1.3 explains that 
wavelets are particularly well adapted to this purpose. 
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2.3.2 Uncertainty Principle 

Can we construct a function / whose energy is well localized in time 
and whose Fourier transform / has an energy concentrated in a small 
frequency neighborhood? The Dirac 8(t — u ) has a support restricted 
to t = u but its Fourier transform has an energy uniformly spread 
over all frequencies. We know that |/(cj)| decays quickly at high fre- 
quencies only if / has regular variations in time. The energy of / must 
therefore be spread over a relatively large domain. 

To reduce the time spread of /, we can scale it by s < 1 while 
maintaining constant its total energy. If 

fs(t) = ) then ll/sll 2 = ll/ll 2 - 

The Fourier transform f s ( uj) = yfs f{su ) ) is dilated by 1/s so we lose in 
frequency localization what we gained in time. Underlying is a trade-off 
between time and frequency localization. 

Time and frequency energy concentrations are restricted by the 
Heisenberg uncertainty principle. This principle has a particularly im- 
portant interpretation in quantum mechanics as an uncertainty as to 
the position and momentum of a free particle. The state of a one- 
dimensional particle is described by a wave function / £ L 2 (M). The 
probability density that this particle is located at t is p^-|/(t)| 2 . The 

probability density that its momentum is equal to lo is 27T \\f\\ 2 1 ./ ( ^ ) 1 2 • 
The average location of this particle is 

1 /*+oo 

U = JffJ *l/(*)l 2d *’ ( 2 ‘ 43 ) 

and the average momentum is 

1 /*+oo 

WM 2[iw - < 2 -«) 

The variances around these average values are respectively 

1 f + oo 

a 2 = — (2.45) 

J J — OO 
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and 

1 f + oo 

{'2.46) 

The larger a t , the more uncertainty there is concerning the position of 
the free particle; the larger cr w , the more uncertainty there is concerning 
its momentum. 



Theorem 2.5 (Heisenberg Uncertainty) The temporal variance and 
the frequency variance of f € L 2 (M) satisfy 

(2.47) 



This inequality is an equality if and only if there exist (u,£,a,b) G 
M 2 x C 2 such that 

./'(/) (H'^e hil "T (2.48) 



Proof 2 . The following proof due to Weyl [75] supposes that lim| <: |_ ) . +0O yftf (t) 
0, but the theorem is valid for any / G L 2 (M). If the average time and 
frequency localization of / is u and £, then the average time and fre- 
quency location of exp {—ift)f(t + u ) is zero. It is thus sufficient to 
prove the theorem for u = £ = 0. Observe that 



2 2 _ 
07 a... — 



2vr||/|| 4 7- 



/ +GO f + OO 

|t/(t)| 2 ctt / |w/(w)| 2 dw. (2.49) 

-oo J — OO 



Since iu>f(uj) is the Fourier transform of /'(i), the Plancherel identity 
(2.26) applied to iu)f{u)) yields 



X2. 2 _ 



1 /*+oo /*+oo 

i^y it/(t)i 2 ^ y ^ irwi 2 ^. (2.50) 



Schwarz’s inequality implies 
1 



^22 '> 

°i ° uj T 



> 



> 



W 



/ -Too 

|t/ , (t)/*(t)| 

-00 
r +00 f 

/ T/'wfw+r«)/w] 

J — 00 ^ 

, o 

/ +OO 

i(l/(i)| 2 )' f 

-OO 



dt 
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Since lini|^ +00 \/t f(t) = 0, an integration by parts gives 

i [ r + oo 1 2 . 

4- (2 ' 51) 

To obtain an equality, Schwarz’s inequality applied to (2.50) must be an 
equality. This implies that there exists b E C such that 

f(t) = -2btf(t). (2.52) 

Hence, there exists a G C such that f(t) = a exp(— bt 2 ). The other 
steps of the proof are then equalities so that the lower bound is indeed 
reached. When u A 0 and ^ A 0 the corresponding time and frequency 

translations yield (2.48). ■ 

In quantum mechanics, this theorem shows that we cannot reduce ar- 
bitrarily the uncertainty as to the position and the momentum of a free 
particle. In signal processing, the modulated Gaussians (2.48) that have 
a minimum joint time-frequency localization are called Gabor chirps. 
As expected, they are smooth functions with a fast time asymptotic 
decay. 

Compact Support Despite the Heisenberg uncertainty bound, we 
might still be able to construct a function of compact support whose 
Fourier transform has a compact support. Such a function would be 
very useful in constructing a finite impulse response filter with a band- 
limited transfer function. Unfortunately, the following theorem proves 
that it does not exist. 

Theorem 2.6 If f / 0 has a compact support then f(uj) cannot be 
zero on a whole interval. Similarly, if f ^ 0 has a compact support 
then f(t.) cannot be zero on a whole interval. 

Proof 2 . We prove only the first statement, since the second is derived 
from the first by applying the Fourier transform. If / has a compact 
support included in [—5, 6] then 

f{t) = ^ J J{w) exp{iwt)dw. 



(2.53) 
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If f(t) =0 for t G [c, d\, by differentiating n times under the integral at 
to = (c + d)/2, we obtain 

f M (to) = 77 - [ /(w) {iuj) n exp(*wt 0 )dw = 0. (2.54) 

^ 7r J —b 

Since 



1 f b - 

f{t) = — /M exp['iu;(t - t 0 )] exp(iwt 0 ) dw, (2.55) 
^ J —b 

developing exp[iw(t — to)] as an infinite series yields for all t G R 



, +00 



W - to)] 1 



n = 0 



n! 



f /( 



uj) co n exp(iwto) duo = 0. (2.56) 



This contradicts our assumption that / 7 ^ 0. 



2.3.3 Total Variation 

The total variation measures the total amplitude of signal oscillations. 
It plays an important role in image processing, where its value depends 
on the length of the image level sets. We show that a low-pass filter can 
considerably amplify the total variation by creating Gibbs oscillations. 



Variations and Oscillations If / is differentiable, its total variation 
is defined by 

/ +00 

\nt)\dt. (2.57) 

•OO 

If {x p }p are the abscissa of the local extrema of / where f'(x p ) = 0, then 
1 1 / 1 1 v = \f( x p+i) — f( x p) I- It ih us measures the total amplitude of 

the oscillations of /. For example, if f(t) = e~ f2 , then \\f\\v = 2. If 
f(t) = sin(7rt)/(7rt), then / has a local extrema at x p G \p,p+ 1] for any 
p G Z. Since \f(x p+ i) — f(x p )\ ~ |p| _1 , we derive that \\f\\v = +cc. 

The total variation of non-differentiable functions can be calcu- 
lated by considering the derivative in the general sense of distributions 
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[66, 79] . This is equivalent to approximating the derivative by a finite 
difference on an interval h that goes to zero: 

ll/llv = B (2.58) 

J _ QO \h\ 

The total variation of discontinuous functions is thus well defined. For 
example, if / = 1^] then (2.58) gives \\f\\v = 2. We say that / has a 
bounded variation if \\f\\v < +oo. 

Whether f is the standard derivative of / or its generalized deriva- 
tive in the sense of distributions, its Fourier transform is f'(uj) = 
iu)f(uj). Hence 

/ +oo 

\m\dt = \\f\\v , 

■00 

which implies that 

I/Ml < • ( 2 -59) 

\D | 

However, |/(cj)| = 0(\oj\~ 1 ) is not a sufficient condition to guarantee 
that / has bounded variation. For example, if f(t) = sin(7rt)/(7rt), then 
/ = l[- 7 r, 7 r] satisfies |/(w)| < 7r | cj | — 1 although ||/||y = +oo. In general, 
the total variation of / cannot be evaluated from |/(cj)|. 

Discrete Signals Let fjv[n] = f(n/N) be a discrete signal obtained 
with a uniform sampling at intervals iV _1 . The discrete total variation 
is calculated by approximating the signal derivative by a finite difference 
over the sampling distance h = 1 . and replacing the integral (2.58) 

by a Riemann sum, which gives: 

W/nWv = \f N [n] - /w[n- 1]| . (2.60) 

n 

If n p are the abscissa of the local extrema of f N , then 
ll/wllv = ^ |/iv[^p+l] — /v’[«p] | • 

p 

The total variation thus measures the total amplitude of the oscillations 
of /. In accordance with (2.58), we say that the discrete signal has a 
bounded variation if ||/iv||v is bounded by a constant independent of 
the resolution N. 
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Gibbs Oscillations Filtering a signal with a low-pass filter can cre- 
ate oscillations that have an infinite total variation. Let /e = f -kh^ be 
the filtered signal obtained with an ideal low-pass filter whose transfer 
function is If / G L 2 (R), then /t converges to / in L 2 (M) 

norm: lim^ +00 ||/ — /e|| = 0. Indeed, /e = / l[-£,£] and the Plancherel 
formula (2.26) implies that 

1 f +0 ° - 1 r 

ll/-/dl 2 = ^ / I/M - /sM| 2 dcj = — / I/Ml 

i-oo i7r /|u|>? 

which goes to zero as £ increases. However, if / is discontinuous in 
to, then we show that /e has Gibbs oscillations in the neighborhood 
of to, which prevents sup teB; | f(t) — f^(t) | from converging to zero as £ 
increases. 

Let / be a bounded variation function \\f\\v < +oo that has an 
isolated discontinuity at t. o, with a left limit /(to ) and right limit /(t/). 
It is decomposed as a sum of f c , which is continuous in the neighborhood 
of to, plus a Heaviside step of amplitude /(t/) — /(t/): 

f(t) = fc{t) + [/(# ) - /(t 0 ")]«(t-to), 

with 

“<*> = { 0 otherwise ' < 2 ' 61 > 

Hence 

fi(t) = fc*h(t) + [/(#) - /(^ )]u*/i f (t- t 0 ). (2.62) 

Since f c has bounded variation and is uniformly continuous in the neigh- 
borhood of to, one can prove (Problem 2.13) that f c *h^(t) converges 
uniformly to / c (t) in a neighborhood of to- The following proposition 
shows that this is not true for u-kh £, which creates Gibbs oscillations. 



Proposition 2.2 (Gibbs) For any / > 0, 



u 




sm :t , 

ox. 

7 IX 



(2.63) 
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Proof 2 . The impulse response of an ideal low-pass filter, calculated in 
(2.29), is h^(t) = sin(£t) / (nt) . Hence 



u * 




«(r) 



sin f lt ~fdr 

7T (t ~ t) 




7T (t ~ t) 



The change of variable x 



f(t — t) gives (2.63). 



f{t) f -k hir,(t) f * fl2n{t) 

















Figure 2.1: Gibbs oscillations created by low-pass filters with cut-off 
frequencies that decrease from left to right. 



The function 



s{C) 




sin x 

ax 

7TX 



is a sigmoid that increases from 0 at t = — oo to 1 at t = +oo, with 
s(0) = 1/2. It has oscillations of period 7r/£, which are attenuated 
when the distance to 0 increases, but their total variation is infinite: 
1 1 s 1 1 y = +oo. The maximum amplitude of the Gibbs oscillations occurs 
at t = ±7 r/G with an amplitude independent of £: 



A = S ( 7T ) — 1 




sin a: 
ax 

7TX 



1 « 0.045 . 



Inserting (2.63) in (2.62) shows that 



f(t) ~ h(t) = [f(to) ~ o )}s{£{t-t 0 )) + e(£,t) , (2.64) 



where lim^ +00 sup| t _ to | <Q , |e(£, t)\ = 0 in some neighborhood of size 
a > 0 around t 0 . The sigmoid s(£(t — t 0 )) centered at t 0 creates a 
maximum error of fixed amplitude for all £. This is seen in Figure 
2.1, where the Gibbs oscillations have an amplitude proportional to 
the jump — /(^ ) at all frequencies £. 
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Image Total Variation The total variation of an image f(x i,x 2 ) 
depends on the amplitude of its variations as well as the length of the 
contours along which they occur. Suppose that f(xi,x 2 ) is differen- 
tiable. The total variation is defined by 

\\f\\v = J I \Vf(x 1 ,x 2 )\dx 1 dx 2 (2.65) 



where the modulus of the gradient vector is 



|V/(.ri,.'r 2 )| 



df{x 1 ,x 2 ) 

dxi 



df{x 1 , 3 : 2 ) 
Ox 2 



1/2 



As in one dimension, the total variation is extended to discontinuous 
functions by taking the derivatives in the general sense of distributions. 
An equivalent norm is obtained by approximating the partial deriva- 
tives by finite differences: 



|A, ffri — 1 


f{x i,x 2 ) - f(x 1 - h,x 2 ) 


\^hJ v^l ? ^2) \ 1 


h 



f{x i,x 2 ) - f{x 1 , 3:2 - h) 
h 



1/2 



One can verify that 

\\f\\v < lim [ I \A h f(x 1 ,x 2 )\dx 1 dx 2 <V2\\f\\ v . 

h->-0 J J 



( 2 . 66 ) 



The finite difference integral gives a larger value when f(x i,x 2 ) is dis- 
continuous along a diagonal line in the (xi,x 2 ) plane. 

The total variation of / is related to the length of it level sets. Let 
us define 

Qy = {( a ’i, X 2 ) G M 2 : f{xi,x 2 )>y}. 

If / is continuous then the boundary 0Q y of Q y is the level set of all 
($iiX 2 ) such that f(xi,x 2 ) = y. Let H l (0Et y ) be the length of DQ y . 
Formally, this length is calculated in the sense of the monodimensional 
Hausdorff measure. The following theorem relates the total variation 
of / to the length of its level sets. 
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Theorem 2.7 (Co-area Formula) 7/||/||y < +oo then 

/ +oo 

H\dO y )dy. (2.67) 

■00 

Proof 2 . The proof is a highly technical result that is given in [79]. We 
give an intuitive explanation when / is continuously differentiable. In 
this case dPl y is a differentiable curve x{y,s) E M 2 , which is parameter- 
ized by the arc-length s. Let t(x) be the vector tangent to this curve 
in the plane. The gradient V/(rr) is orthogonal to f(x). The Prenet 
coordinate system along dfi y is composed of t(x) and of the unit vector 
n(x) parallel to V f{x). Let ds and dn be the Lebesgue measures in the 
direction of r and n. We have 

\Vf(x)\=Vf(x).h = ^, (2.68) 

where dy is the differential of amplitudes across level sets. The idea of 
the proof is to decompose the total variation integral over the plane as 
an integral along the level sets and across level sets, which we write: 

\\f\\v = [ f \Nf{xi f x 2 )\ dxi dx 2 = f f \Nf{x{y,s))\dsdn. (2.69) 
J J J J dQy 

By using (2.68) we can get 

\\f\\v =11 dsd v ■ 

J J dCly 

But f gQ ds = ff 1 (dfl y ) is the length of the level set, which justifies 
(2.67). * ■ 

The co-area formula gives an important geometrical interpretation of 
the total image variation. Images are uniformly bounded so the integral 
(2.67) is calculated over a finite interval and is proportional to the 
average length of level sets. It is finite as long as the level sets are not 
fractal curves. Let f = a 1q be proportional to the indicator function 
of a set O C M 2 which has a boundary of length L. The co-area 
formula (2.7) implies that ||/||v = a L. In general, bounded variation 
images must have step edges of finite length. 
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Figure 2.2: (a): The total variation of this image remains nearly con- 
stant when the resolution N increases, (b): Level sets ()Q y obtained by 
sampling uniformly the amplitude variable y. 



Discrete Images A camera measures light intensity with photore- 
ceptors that perform a uniform sampling over a grid that is supposed 
to be uniform. For a resolution N, the sampling interval is iV _1 and the 
resulting image can be written //v[ni,n 2 ] = /(ni/iV, n 2 /iV). Its total 
variation is defined by approximating derivatives by finite differences 
and the integral (2.66) by a Riemann sum: 



ni 7i2 



/[ni,n 2 ] - /[ni - l,n 2 ] 




/[ni,n 2 ] - /[ni,n 2 



4 ) 



2 \ 1/2 



(2.70) 



In accordance with (2.66) we say that the image has bounded variation 
if UnW v is bounded by a constant independent of the resolution N. 
The co-area formula proves that it depends on the length of the level 
sets as the image resolution increases. The \[2 upper bound factor in 
(2.66) comes from the fact that the length of a diagonal line can be 
increased by \f2 if it is approximated by a zig-zag line that remains on 
the horizontal and vertical segments of the image sampling grid. Figure 
2.2(a) shows a bounded variation image and Figure 2.2(b) displays the 
level sets obtained by discretizing uniformly the amplitude variable 




68 



CHAPTER 2. FOURIER KINGDOM 



y. The total variation of this image remains nearly constant as the 
resolution varies. 



2.4 Two-Dimensional Fourier Transform 1 



The Fourier transform in M” is a straightforward extension of the one- 
dimensional Fourier transform. The two-dimensional case is briefly 
reviewed for image processing applications. The Fourier transform of a 
two-dimensional integrable function / G L 1 (M 2 ) is 



/(w I1W2) = 



r>+oo />+oo 



f(x 1 , £ 2 ) exp[— i(uJiXi + ^ 2 X 2 )} dx\ clx 2 . (2.71) 



' —OO J —OO 



In polar coordinates exp /(aq.r + (J 2 y)\ can be rewritten 

exp^Wi^i + 0 J 2 X 2 )} = exp[ip(xi cos 9 + x 2 sin 9)] 

with p = \J + ay. It is a plane wave that propagates in the di- 
rection of 9 and oscillates at the frequency p. The properties of a 
two-dimensional Fourier transform are essentially the same as in one 
dimension. We summarize a few important results. 

• If / G L 1 (R 2 ) and / G L 1 (R 2 ) then 



f( Xl ,X2) = -^ JJ f(uji,ui2) exp[i(u)iXi+U2X2)]duidu2. (2.72) 

If / G L 1 (R 2 ) and h G L 1 (M 2 ) then the convolution 

g{x 1 , x 2 ) = f*h(x i,x 2 ) = f ( /(«i, u 2 ) h(xi - u\, x 2 - u 2 ) du\ du 2 



has a Fourier transform 



g{u) 1 , w 2 ) = f{uJ 1 , lo 2 ) h(u) U u) 2 ). 



The Parseval formula proves that 



f(x i,x 2 ) g*{x i,x 2 ) dx 1 dx 2 = 



(2.73) 



(2.74) 



— / / f(cu 1 ,uj2)g*(^iiiX2)duj 1 duj2 




